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ABSTRACT 

Recent research in multi-robot exploration and mapping has 
focused on sampling environmental fields, which are typi- 
cally modeled using the Gaussian process (GP). Existing 
information-theoretic exploration strategies for learning GP- 
based environmental field maps adopt the non-Markovian 
problem structure and consequently scale poorly with the 
length of history of observations. Hence, it becomes compu- 
tationally impractical to use these strategies for m situ, real- 
time active sampling. To ease this computational burden, 
this paper presents a Markov-based approach to efficient 
information-theoretic path planning for active sampling of 
GP-based fields. We analyze the time complexity of solving 
the Markov-based path planning problem, and demonstrate 
analytically that it scales better than that of deriving the 
non-Markovian strategies with increasing length of planning 
horizon. For a class of exploration tasks called the transect 
sampling task, we provide theoretical guarantees on the ac- 
tive sampling performance of our Markov-based policy, from 
which ideal environmental field conditions and sampling task 
settings can be established to limit its performance degrada- 
tion due to violation of the Markov assumption. Empirical 
evaluation on real-world temperature and plankton density 
field data shows that our Markov-based policy can generally 
achieve active sampling performance comparable to that of 
the widely-used non-Markovian greedy policies under less 
favorable realistic field conditions and task settings while 
enjoying significant computational gain over them. 

Categories and Subject Descriptors 

G.3 [Probability and Statistics]: Markov processes, stochas- 
tic processes; 1.2.8 [Problem Solving, Control Methods, 
and Search]: Dynamic programming; 1.2.9 [Robotics]: 

Autonomous vehicles 

General Terms 

Algorithms, Performance, Experimentation, Theory 

Keywords 

Multi-robot exploration and mapping, adaptive sampling, 
active learning, Gaussian process, non-myopic path planning 

Cite as: Active Markov Information-Theoretic Path Planning for 
Robotic Environmental Sensing, Kian Hsiang Low, John M. Dolan, and 
Pradeep Khosla, Proc. of 10th Int. Conf. on Autonomous Agents 
and Multiagent Systems (AAMAS' 2011), Turner, Yolum, Sonen- 
berg and Stone (eds.), May, 2-6, 2011, Taipei, Taiwan, pp. XXX-XXX. 
Copyright @ 2011, International Foundation for Autonomous Agents and 
Multiagent Systems (www.ifaamas.org). All rights reserved. 



John M. Dolan and Pradeep Khosia 

Robotics Institute 
Carnegie Mellon University 
Pittsburgh PA 15213 USA 

jmd@cs.cmu.edu, pkk@ece.cmu.edu 
1. INTRODUCTION 

Research in multi-robot exploration and mapping has re- 
cently progressed from building occupancy grids [14| t o sam- 
pling spatially varying environmental phenomena |6| [t], in 
particular, environmental fields (e.g., plankton density, pol- 
lutant concentration, temperature fields) that are charac- 
terized by continuous-valued, spatially correlated measure- 
ments (see Fig. I). Exploration strategies for building occu- 
pancy grid maps usually operate under the assumptions of 
(a) discrete, (b) independent cell occupancies, which impose, 
respectively, the following limitations for learning environ- 
mental field maps: these strategies (a) cannot be fully in- 
formed by the continuous field measurements and (b) cannot 
exploit the spatial correlation structure of an environmental 
field for selecting observation paths. As a result, occupancy 
grid mapping strategies are not capable of selecting the most 
informative observation paths for learning an environmental 
field map. 

Furthermore, occupancy grid mapping strategies typically 
assume that range sensing is available. In contrast, many in 
situ environmental and ecological sensing applications (e.g., 
monitoring of ocean phenomena, forest ecosystems, or pollu- 
tion) permit only point-based sensing, thus making a high- 
resolution sampling of the entire field impractical in terms 
of resource costs (e.g., energy consumption, mission time). 
In practice, the resource cost constraints restrict the spatial 
coverage of the observation paths. Fortunately, the spatial 
correlation structure of an environmental field enables a map 
of the field (in particular, its unobserved areas) to be learned 
using the point-based observations taken along the resource- 
constrained paths. To learn this map, a commonly-used 
approach in spatial statistics [15] is to assume that the envi- 
ronmental field is realized from a probabilistic model called 
the Gaussian process (GP) (Section |3.2[ ). More importantly, 
the GP model allows an environmental field to be formally 
characterized and consequently provides formal measures of 
mapping uncertainty (e.g., based on mean-squared error [6] 
or entropy criterion [t]) for directing a robot team to explore 
highly uncertain areas of the field. In this paper, we focus on 
using the entropy criterion to measure mapping uncertainty. 

How then does a robot team plan the most informative 
resource-constrained observation paths to minimize the map- 
ping uncertainty of an environmental field? To address this, 
the work of [t] has proposed an information-theoretic multi- 
robot exploration strategy that selects non-myopic observa- 
tion paths with maximum entropy. Interestingly, this work 
has established an equivalence result that the maximum- 
entropy paths selected by such a strategy can achieve the 



dual objective of minimizing the mapping uncertainty de- 
fined using tfie entropy criterion. When this strategy is ap- 
plied to sampling a GP-based environmental field, it can be 
reduced to solving a non-Markovian, deterministic planning 
problem called the information-theoretic multi-robot adap- 
tive sampling problem (iMASP) (Section [3|. Due to the 
non-Markovian problem structure of iMASP, its state size 
grows exponentially with the length of planning horizon. To 
alleviate this computational difficulty, an anytime heuristic 
search algorithm called Learning Real-Time A* [2] is used to 
solve zMASP approximately. However, this algorithm does 
not guarantee the performance of its induced exploration 
policy. We have also observed through experiments that 
when the joint action space of the robot team is large or the 
planning horizon is long, it no longer produces a good pol- 
icy fast enough. Even after incurring a huge amount of time 
and space to improve the search, its resulting policy still 
performs worse than the widely-used non-Markovian greedy 
policy, the latter of which can be derived efficiently by solv- 
ing the myopic formulation of iMASP (Section |3.3[ ). 

Though the anytime and greedy algorithms provide some 
computational relief to solving iMASP (albeit approximately), 
they inherit iMASP 's non-Markovian problem structure and 
consequently scale poorly with the length of history of ob- 
servations. Hence, it becomes computationally impractical 
to use these non-Markovian path planning algorithms for in 
situ, real-time active sampling performed (a) at high resolu- 
tion (e.g., due to high sensor sampling rate or large sampling 
region), (b) over dynamic features of interest (e.g., algal 
blooms, oil spills), (c) with resource cost constraints (e.g., 
energy consumption, mission time), or (d) in the presence of 
dynamically changing external forces translating the robots 
(e.g., ocean drift on autonomous boats), thus requiring fast 
replanning. For example, the deployment of autonomous 
underwater vehicles (AUVs) and boats for ocean sampling 
poses the above challenges/issues among others ;4' . 

To ease this computational burden, this paper proposes a 
principled Markov-based approach to efficient information- 
theoretic path planning for active sampling of GP-based en- 
vironmental fields (Section [4|, which we develop by assum- 
ing the Markov property in iMASP planning. To the proba- 
bilistic robotics community, such a move to achieve time effi- 
ciency is probably anticipated. However, the Markov prop- 
erty is often imposed without carefully considering or for- 
mally analyzing its consequence on the performance degra- 
dation while operating in non-Markovian environments. In 
particular, to what extent does the environmental structure 
affect the performance degradation due to violation of the 
Markov assumption? Motivated by this lack of treatment, 
our work in this paper is novel in demonstrating both theo- 
retically and empirically the extent of which the degradation 
of active sampling performance depends on the spatial cor- 
relation structure of an environmental field. An important 
practical consequence is that of establishing environmen- 
tal field conditions under which the Markov-based approach 
performs well relative to the non-Markovian iMASP-based 
policy while enjoying significant computational gain over it. 
The specific contributions of our work include: 
• analyzing the time complexity of solving the Markov-based 
information-theoretic path planning problem, and show- 
ing analytically that it scales better than that of deriv- 
ing the non-Markovian strategies with increasing length 
of planning horizon (Section 



providing theoretical guarantees on the active sampling 
performance of our Markov-based policy (Section 4.2 I for 



a class of exploration tasks called the transect sampling 
task (Section [2|, from which various ideal environmental 
field conditions and sampling task settings can be estab- 
lished to limit its performance degradation; 
• empirically evaluating the active sampling performance 
and time efficiency of our Markov-based policy on real- 
world temperature and plankton density field data under 
less favorable realistic environmental field conditions and 
sampling task settings (Section [5|. 

2. TRANSECT SAMPLING TASK 

Fig. ijillustrates the transect sampling task introduced 
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13 previously. A temperature field is spatially dis- 
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tributed over a 25 m x 150 m transect that is discretized into 
a 5 X 30 grid of sampling locations comprising 30 columns, 
each of which has 5 sampling locations. It can be observed 
that the number of columns is much greater than the number 
of sampling locations in each column; this observed prop- 
erty is assumed to be consistent with every other transect. 
The robots are constrained to simultaneously explore for- 
ward one column at a time from the leftmost to the right- 
most column of the transect such that each robot samples 
one location per column for a total of 30 locations. So, each 
robot's action space given its current location consists of 
moving to any of the 5 locations in the adjacent column on 
its right. The number of robots is assumed not to be larger 
than the number of sampling locations per column. We as- 
sume that an adversary chooses the starting robot locations 
in the leftmost column and the robots will only know them 
at the time of deployment; such an adversary can be the 
dynamically changing external forces translating the robots 
(e.g., ocean drift on autonomous boats) or the unknown ob- 
stacles occupying potential starting locations. The robots 
are allowed to end at any location in the rightmost column. 

In practice, the constraint on exploring forward in a tran- 
sect sampling task permits the planning of less complex ob- 
servation paths that can be achieved more reliably, using 
less sophisticated control algorithms, and by robots with 
limited maneuverability (e.g., unmanned aerial vehicles, au- 
tonomous boats and AUVs [9]). For practical applications, 
while the robot is in transit from its current location to a 
distant planned waypoint ^[l3], this task can be performed 
to collect the most informative observations during transit. 
In monitoring of ocean phenomena and freshwater quality 
along rivers, the transect can span a plankton density or 
temperature field drifting at a constant rate from right to 
left and the autonomous boats are tasked to explore within 
a line perpendicular to the drift. As another example, the 
transect can be the bottom surface of ship hull or other mar- 
itime structure to be inspected and mapped by AUVs. 

3. NON-MARKOVIAN PATH PLANNING 
3.1 Notations and Preliminaries 

Let W be the domain of the environmental field represent- 
ing a set of sampling locations in the transect such that each 
location u £U yields a measurement Zu ■ The columns of the 
transect are indexed in an increasing order from left to right 
with the leftmost column being indexed '0'. Each planning 
stage is associated with a column from which every robot in 
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Figure 1: Transect sampling task on a temperature 
field (measured in°C) spatially distributed over a 
25 m X 150 m transect that is discretized into a 5 x 30 
grid of sampling locations (white dots). 

the team selects and takes an observation (i.e., comprising 
a pair of location and its measurement). Let k denote the 
number of robots in the team. In each stage i, the team of k 
robots then collects from column i a total of k observations, 
which are denoted by a pair of vectors Xi of k locations and 
of the corresponding measurements. Let xq-a and z^^q. . 
denote vectors comprising the histories of robots' sampling 
locations and corresponding measurements over stages to 
i (i.e., concatenations of xq, xi, . . . , Xi and z^g , z^^ , . . . , Zx^), 
respectively. Let Z„, Z^., and Z^g.^ be random measure- 
ments that are associated with the realizations Zu, Zx^, and 
ZxQ-i, respectively. 

3.2 Gaussian Process-Based Environmental Field 

The GP model can be used to formally characterize an 
environmental field as follows: the environmental field is de- 
fined to vary as a realization of a GP. Let {Zu}u£U denote 
a GP, i.e., every finite subset of {Zu}ueu has a multivariate 
Gaussian distribution [s]. The GP is fully specified by its 

mean = E[Z„] and covariance auv = coy[Zu, Z^] for all 
u,v £ hi. We assume that the GP is second-order station- 
ary, i.e., it has a constant mean and a stationary covariance 
structure (i.e., Ouv is a function oi u — v for all u,v £U). In 
particular, its covariance structure is defined by the widely- 
used squared exponential covariance function [s] 

CTuv = cTs exp | — ^(u — u)^Af"^(u — t;)| + a^Suv (1) 

where is the signal variance, crjj is the noise variance, 
M is a diagonal matrix with length-scale components £i 
and £2 in the horizontal and vertical directions of a tran- 
sect, respectively, and 5„„ is a Kronecker delta of value 1 
if u — V, and otherwise. Intuitively, the signal and noise 
variances describe, respectively, the intensity and noise of 
the field measurements while the length-scale can be in- 
terpreted as the approximate distance to be traversed in 
a transect for the field measurement to change considerably 
[s] ; it therefore controls the degree of spatial correlation or 
"similarity" between field measurements. In this paper, the 
mean and covariance structure of the GP are assumed to be 
known. Given that the robot team has collected observa- 
tions xo, Zxg,Xi, Zxi, ■ ■ ■ ,Xi, Zxi over stages to i, the distri- 
bution of Zu remains Gaussian with the following posterior 
mean and covariance 

[^u\xg,i ~ + ^"a;o:i ^aio-i^Oii {^^Oii ~ i^'^a-.i) (2) 

where iixg.^ is a row vector with mean components for 
every location w of x^-a, '^■axg.i is a row vector with covari- 
ance components cr„„ for every location w of xqa, '^xg.iv is 
a column vector with covariance components (Tm„ for every 
location ui of xqa-, and Yjxg.ixg.i is a covariance matrix with 
components a^iy for every pair of locations ui, y of xo:i. Note 
that the posterior mean /iuixQ.- ([2| is the best unbiased pre- 
dictor of the measurement Zu at unobserved location u. An 



important property of GP is that the posterior covariance 
'^uv\xg.i (|3| is independent of the observed measurements 
Zxg.^ \ this property is used to reduce iMASP to a determin- 
istic planning problem as shown later. 

3.3 Deterministic zMASP Planning 

Supposing the robot team starts in locations a;o of leftmost 
column 0, an exploration policy is responsible for direct- 
ing it to sample locations X\,X2, ■ ■ ■ , Xt+i of the respective 
columns 1,2, . . . ,t + 1 to form the observation paths. For- 
mally, a non-Markovian policy is denoted by tt = (7ro(a;o:o = 
a;o),7ri(2;o:i), . . . ,7rt(a;o:t)) where ni{xoA) maps the history 
xoA of robots' sampling locations to a vector a; G A{xi) of 
robots' actions in stage i (i.e., Ui <— 'Ki{xoA)), and .4(2;^) is 
the joint action space of the robots given their current lo- 
cations Xi. We assume that the transition function T{xi,ai) 
determimstically (i.e., no localization uncertainty) moves 
the robots to their next locations Xi+\ in stage i -I- 1 (i.e., 
Xi+i ^ T{xi,ai)). Putting Hi and r together yields the as- 
signment Xi+l -S— T{Xi,-Ri{xoA))- 

The work of [t] has proposed a non-Markovian policy vr* 
that selects non-myopic observation paths with maximum 
entropy for sampling a GP-based field. To know how tt* 
is derived, we first define the value under a policy tt to be 
the entropy of observation paths when starting in xq and 
following TT thereafter: 

V^{X^)^ W[Zx,.^^,\Zxg,H] 

= -J /(^^O:t + lk)log/(^^l:t+ll^i^0''r) dZa;o:t+l 

(4) 

where / denotes a Gaussian probability density function. 
When a non-Markovian policy tt is plugged into Q, the 
following (t-l- l)-stage recursive formulation results from the 
chain rule for entropy and Xi+i ^ T{xi,TTi{xQA))'- 

Vi^ixoA) = M[Zx,_^i\Zxo,i,'!Vi] + Vi"+i{xoA + l) 

= U[Zr(xi,7T,{xg.,i})\Zxg,,] + Vj^+i {{xiyA , T {xi , TVi {xQ: 

Vr(xO;0 = M[Zxt + :^\Zxg.,t,TVt] 

— 'iii[Zr(xt.TVt(xQ-t))\Zx„.t] 

(5) 

for stage i = 0, . . . ,t — 1 such that each stagewise posterior 
entropy (i.e., of the measurements Zx^^i to be observed in 
stage i + 1 given the history of measurements Zxg.i observed 
from stages to i) reduces to 

nZ.^+AZ^oJ = ^log (27re)'=iE.^^,|.„J (6) 

where E^.^jIj;^.. is a covariance matrix with components 
<^uv\xo.i for every pair of locations u, v of Xi+i, each of which 
is independent of observed measurements Zxg.^ by (|3|, as dis- 
cussed above. So, ^[Zxi^ilZxg.i] can be evaluated in closed 
form, and the value functions (|5| only require the history 
of robots' sampling locations xoa as inputs but not that of 
corresponding measurements Zxg.- . 

Solving iMASP involves choosing tt to maximize ^/^{xo) 
(5 1, which yields the optimal policy tt*. Plugging tt* into 
(5 1 gives the {t + l)-stage dynamic programming equations: 

Vi^ (xqa) = max ]Sl[Z^(^x,,ai)\Zxg-,] + ViXi{{xoA,T{x,,a 
Vt"" (xo:t) = max e[Z^(^^,^^)|Z^o^J 

(7) 



for stage i — 0, . . . ,t — 1. Since each stagewise posterior en- 
tropy EI[Zt-(3;. (j.jIZiio,.] (|6| can be evaluated in closed form 
as explained above, iMAsP for sampling the GP-based field 
([7| reduces to a deterministic planning problem. Further- 
more, it turns out to be the well-known maximum entropy 
sampling problem [lO] as demonstrated in [7]. Policy tt* — 
{7ro(a::o:o), ■ • • , 7r*(a;o:t)) can be determined by 

Trt{xO:i) = aTgmax]Sl[Zr(x,,ai)\Za:o,,] + ViXl{{XQ:i , T , ai))) 

T^tixo-.t) = argmaxM[ZT-(^xt,at)\Zxo:t] 

at £A(xt) 

(8) 

for stage i — 0, . . . ,t — 1. Similar to the optimal value func- 
tions ([7|, TT* only requires the history of robots' sampling lo- 
cations as inputs. So, tt* can generate the maximum-entropy 
paths prior to exploration. 

Solving the myopic formulation of iMASP ( 7| is often con- 
sidered to ease computation (Section 4.1 1, which entails de- 
riving the non-Markovian greedy policy tt*^ — (7ro'(a::o:o), • • • , 
T^t^ixo-.t)) where, for stage i — 0, . . . ,t, 



arg max i 

ai£A{xi) 



\Zr 



(9) 



The work of [s] has proposed a non-Markovian greedy policy 
TT*^ = {7ro^(a;o:o), • • • jT^t' (xo-.t)) to approximately maximize 
the closely related mutual information criterion: 

n^'(xo:i) = aTgm&x'E.[Z^^x,.ai)\Zxa-_i 



aiSA(xi) 



- ^[ZT(x,.a,)\Zxo,,^i] 

(10) 

for stage i — 0, . . . ,t where a:;o:i-i-i denotes the vector com- 
prising locations of domain U not found in {xo;i,T{xi,ai)). 
It is shown in [s] that tt^^ greedily selects new sampling lo- 
cations that maximize the increase in mutual information. 
As noted in |7], this strategy is deficient in that it may not 
necessarily minimize the mapping uncertainty defined using 
the entropy criterion. More importantly, it suffers a huge 
computational drawback: the time needed to derive tt*^ de- 
pends on the map resolution (i.e., \U\) (Section 4.11. 



4. MARKOV-BASED PATH PLANNING 

The Markov property assumes that the measurements Zx^j^^ 
to be observed next in stage i -I- 1 depends only on the cur- 
rent measurements Zx^ observed in stage i and is condition- 



ally independent of the past measurements Zx 



observed 



from stages to i — 1. That is, jizx^j^^ k^^o i) ~ 
for aU 23,0, Zi^i+i- As a result, H[Z2,;^J.Zi:q^.] Jel can 

be approximated by W\Zxi^^\Zx^- It is therefore straight- 
forward to impose the Markov assumption on iMASP l[7|, 
which yields the following dynamic programming equations 
for the Markov-based path planning problem: 



V,{x,) = 
Vtixt) = 



max 11 

ai&A{xi) 

max I 

at^A[xt) 



for stage i = 0, . . . , t 
policy TT = (7i'o(a;o), . . 



T^i(xi) = 
Tii{Xt) = 



arg max i 

ai<^A!yXi) 

arg max I 

at^A{xt) 



^[ZT{x,,ai)\Zx,] + Vi + l{T{Xi,ai)) 
^{Zr{xt,at)\Zxt\ ■ 

(11) 

1. Consequently, the Markov-based 
,Trt{xt)) can be determined by 

^Zr{xi,a,)\ZxA + Vi + l{T{Xt,ai)) 



\Zr{xt,at)\Zx-^ 



4.1 Time Complexity: Analysis & Comparison 

Theorem 1. Let A = A{xo) = ... = A{xt). Deriving 
the Markov-based policy vf (jg^ for the transect sampling task 
requires O {\A\^ [t + k"^)) time. 

Note that \A\ = ''Cfe = 0{r'') where r is the number of 
sampling locations per column and fc < r as assumed in 
Section [2] Though |^| is exponential in the number k of 
robots, r is expected to be small in a transect, which pre- 
vents from growing too large. 

In contrast, deriving iMASP-based policy tt* H requires 
0{\A \^t^k'^) time. Deriving greedy policies tt'^ Q and tt^ 



( 10 1 incur, respectively, 0{\A\t*k^+\A\'^tk'^) and C(l^lilW|=' + 



It^tfc**) = 0{\A\t'^r^ + \A\'^tk'^) time to compute the obser- 
vation paths over all possible choices of starting robot 
locations. Clearly, all the non-Markovian strategies do not 
scale as well as our Markov-based approach with increasing 
length f -I- 1 of planning horizon or number t + 2 of columns, 
which is expected to be large. As demonstrated empiri- 
cally (SectiornSl, the Markov-based policy can be derived 
faster than vr'^nd iv^' by more than an order of magnitude; 
this computational advantage is boosted further for transect 
sampling tasks with unknown starting robot locations. 

4.2 Performance Guarantees 

We will first provide a theoretical guarantee on how the 
Markov-based policy tt \12\ performs relative to the non- 
Markovian iMASP-based policy tt* ([sJi for the case of 1 
robot. This key result follows from our intuition that when 
the horizontal spatial correlation becomes small, exploiting 
the past measurements for path planning should hardly im- 
prove the active sampling performance in a transect sam- 
pling task, thus favoring the Markov-based policy. Though 
this intuition is simple, supporting it with formal theoretical 
results and their corresponding proofs (Appendix |A]| turns 
out to be non-trivial as shown below. 

Recall the Markov assumption that '^[Zxi^x\Zxfyi] Q is to 
be approximated by EI[Zi,.^ JZ^;.]. This prompts us to first 
consider bounding the difference of these posterior entropies 
that ensues from the Markov property: 



[Zxi^l \Zx 



[Zxi^l \Zxo:i] 



1 



+ 1 \^a:, 
1 



(13) 



> 



This difference can be interpreted as the reduction in un- 
certainty of the measurements Zx^^-^ to be observed next 
in stage i -I- 1 by observing the past measurements Zxo.i_i 
from stages to i — 1 given the current measurements Zx^ 
observed in stage i. This difference is small if i-i does 
not contribute much to the reduction in uncertainty of Zx^^i 
given Zxi . It ( |13[ ) is often known as the conditional mutual 
information of Zxi^^ and Zxo-i^i given Zx^ denoted by 



^[Zxi^l ', Zxo;i-l \Zxi] — \Zx 



^Zxi + l\Zxo 



(12) 



which is of value if the Markov property holds. 

The results to follow assume that the transect is discretized 
into a grid of sampling locations. Let ui and tJ2 denote the 
horizontal and vertical grid discretization widths (i.e., sep- 
arations between adjacent sampling locations), respectively. 

Let i'l = £i/ljJi and £'2 = ^2/(^2 represent the normalized hor- 
izontal and vertical length-scale components, respectively. 



The following lemma boun ds the variance reduction term 



Lemma 2. Let ^ = exp -j —77^ i and p = 1 + // 
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The next lemma is fundamental to the subsequent results 
on the active sampling performance of Markov-based policy 
TT. It provides bounds on I[Zxi_^-i\Zxo.i_i\Zx-], which follow 
immediately from (13 1, Lemma [2| and the lower bound 
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Lemma 3. //^ < t/ien < ; < A(i) 

./.e.eAW^llog(l-^^-^|^ 
Remark. If j < s, then A{j) < A(s) for j, s = 0, . . . ,t. 

From Lemmapl since A(i) bounds I[Zxi^i ; Zxo.i^i l^xi] from 
above, a smaTT ; IZ^,.] can be guaranteed by 

making A(i) small. From the definition of A(i), there are 
a few ways to achieve a small A(j): (a) A(i) depends on 
^'1 through ^. As 0+, C 0+, by definition. Con- 

sequently, A(i) — >■ 0"''. A small (.'i can be obtained us- 
ing a small £1 and/or a large uji, by definition; (b) A(i) 
also depends on the noise-to-signal ratio a^/a'^ through p. 
Raising tj^ or lowering increases p, by definition. This, 
in turn, decreases A(i); (c) Since i indicates the length of 
history of observations, the remark after Lemma |3] tells us 
that a shorter length produces a smaller A{i). To sum- 
marize, (a) environmental field conditions such as smaller 
horizontal spatial correlation and noisy, less intense fields, 
and (b) sampling task settings such as larger horizontal grid 
discretization width and shorter length of history of obser- 
vations all contribute to smaller A(i), and hence smaller 
^Zxi^i', Zxg.i_i\Zxi]- This analysis is important for under- 
standing the practical implication of our theoretical results 
later. A limitation with using Lemma [S] is that of the suffi- 
cient condition ^ < p/i, which will hold if the field conditions 
and task settings realized above to make A(i) small are ad- 
equately satisfied. 

The following theorem uses the induced optimal value 
Vo(x o) f rom solving the Markov- based path planning prob- 



lem (111 to bound the maximum entropy Vq (xq) of obser- 
vation paths achieved by tt* from solving iMASP ([7|: 

Theorem 4. Let et = E!=i^(s) < - » + l)A(t). // 
C < ^, thenVi{x^)-e, < V^!'' {xo:^) < V^{x,) fori^ 0, . . . ,t. 

The above result is useful in providing an efficient way of 
knowing the maximum entropy Vq (xq), albeit approximately: 
the time needed to derive the two-sided bounds on Vq (xq) 
is linear in the length of planning horizon (Theorem [T| as 
opposed to exponential time required to compute the ex- 
act value of Vq (xq). Since the error bound is defined 
as a sum of A(s)'s, we can rely on the above analysis of 
A(s) (see paragraph after Lemma |3| to improve this error 
bound: (a) environmental field conditions such as smaller 
horizontal spatial correlation and noisy, less intense fields, 
and (b) sampling task settings such as larger horizontal grid 



discretization width and shorter planning horizon (i.e., fewer 
transect columns) all improve this error bound. 

In the main result below, the Markov-based policy n is 
guaranteed to achieve an entropy Vq{xq) of observation paths 
(i.e., by plugging tt into (j5|) that is not more than eo from 
the maximum entropy Vq (xq) of observation paths achieved 
by policy tt*: 

Theorem 5. If ^ < ^, then policy tt is eo-optimal in 

achieving the maximum- entropy criterion, i.e., Vq^ (xq) — 
Vo^'ixQ) < eo. 

Again, since the error bound eo is defined as a sum of A(s)'s, 
we can use the above analysis of A(s) to improve this bound: 
(a) environmental field conditions such as smaller horizontal 
spatial correlation and noisy, less intense fields, and (b) sam- 
pling task settings such as larger horizontal grid discretiza- 
tion width and shorter planning horizon (i.e., fewer transect 
columns) all result in smaller eo, and hence improve the ac- 
tive sampling performance of Markov-based policy n relative 
to that of non-Markovian zMASP-based policy n* . This not 
only supports our prior intuition (see first paragraph of this 
section) but also identifies other means of limiting the per- 
formance degradation of the Markov-based policy. 

For the multi-robot case, a condition has to be imposed on 
the covariance structure of GP to obtain a similar guarantee: 

for m = 0, . . . , i and any u, v, xq, xi, . . . ,Xi £U. Intuitively, 
( |14[ | says that further conditioning does not make Zu and 
Zv more correlated. Note that (14 1 is satisfied if it = t;. 

Similar to Lemma [3] for the 1-robot case, we can bound 
H^^i+i i ^xo.i-i\Zxi] for the multi-robot case but tighter con- 
ditions have to be satisfied: 



Lemma 6. Let £[ = £'2. // C < min(-^,-^^) and (14i 

tk 4k 

is satisfied, then < ; JZ^,.] < Afc(i) where 



Afc(j) = -lo: 



To improve the upper bound Afc(j), the above analysis of 
A(i) can be applied here as these two upper bounds are 
largely similar: (a) environmental field conditions such as 
smaller spatial correlation and noisy, less intense fields, and 
(b) sampling task settings such as larger grid discretiza- 
tion width and shorter planning horizon (i.e., fewer transect 
columns) all entail smaller Afc(i). Decreasing the number k 
of robots also reduces Afc(i), thus yielding tighter bounds on 
^Zxi^l ; ZxQ.i^ilZxi]. Using Lemmapl we can derive guaran- 
tees similar to that of Theorems [4] ana [5] on the performance 
of Markov-based policy tt for the multi-robot case. 

5. EXPERIMENTS AND DISCUSSION 



In Section [4. 2[ we have highlighted the practical implica- 
tion of our main theoretical result (i.e., Theorem|5|, which 
establishes various environmental field conditions and sam- 
pling task settings to limit the performance degradation of 
Markov-based policy n. This result, however, does not re- 
veal whether n performs well (or not) under "seemingly" 
less favorable field conditions and task settings that do not 
jointly satisfy its sufficient condition ^ < p/{tk). These 
include large spatial correlation, less noisy, highly intense 
fields, small grid discretization width, long planning horizon 



(i.e., many transect columns), and large number of robots. 
So, this section evaluates the active sampling performance 
and time efficiency of tt empirically on two real- world datasets 
under such field conditions and task settings as detailed be- 
low: (a) May 2009 temperature field data of Panther Hollow 
Lake in Pittsburgh, PA spanning 25 m by 150 m, and (b) 
June 2009 plankton density field data of Chesapeake Bay 
spanning 314 m by 1765 m. 

Using maximum likelihood estimation (MLE) [s] , the learned 
hyperparameters (i.e., horizontal and vertical length-scales, 
signal and noise variances) are, respectively, li — 40.45 m, 
£2 = 16.00 m, (J? = 0.1542, and = 0.0036 for the temper- 
ature field, and £1 = 27.53 m, £2 = 134.64 m, = 2.152, 
and cr^ = 0.041 for the plankton density field. It can be 
observed that the temperature and plankton density fields 
have low noise-to-signal ratios (j'^/<j1 of 0.023 and 0.019, re- 
spectively. Relative to the size of transect, both fields have 
large vertical spatial correlations, but only the temperature 
field has large horizontal spatial correlation. 

The performance of Markov-based policy tt is compared 
to non-Markovian policies produced by two state-of-the-art 
information-theoretic exploration strategies: greedy policies 
■k'^ ([9]) and TT*^ ( 10 1 proposed by [t] and [3] , respectively. 
The non-Markovian policy vr* that has to be derived ap- 
proximately using Learning Real-Time A* is excluded from 
comparison due to the reason provided in Section [l] 

5.1 Performance Metrics 

The tested policies are evaluated using the two metrics 
proposed in [t], which quantify the mapping uncertainty of 
the unobserved areas of the field differently: (a) The ENT(7r) 
metric measures the posterior joint entropy ]HI[Z^q.j^j l^a;o t+i] 
of field measurements ^^o t+i unobserved locations xo-.t+i 
where xo-.t+i denotes the vector comprising locations of do- 
main hi not found in the sampled locations a;o:t-i-i selected 
by policy tt. Smaller ENT(7r) implies lower mapping uncer- 
tainty; (b) The ERR(7r) metric measures the mean-squared 
relative error \U\'^ 5Z„gw{(^"-Mtiko:t+i)/A'}^ resulting from 
using the observations (i.e., sampled locations XQ-.t+i and 
corresponding measurements ^aio t+i) selected by policy tt 
and the posterior mean iiu\xQ-t+i Q to predict the field 
where fi = J^uew •^"^ Smaller ERR(7r) implies higher 

prediction accuracy. Two noteworthy differences distinguish 
these metrics: (a) The ENT(7r) metric exploits the spatial 
correlation between field measurements in the unobserved 
areas whereas the ERR(7r) metric implicitly assumes inde- 
pendence between them. As a result, unlike the ERR(7r) 
metric, the ENT(7r) metric does not overestimate the map- 
ping uncertainty. To illustrate this, suppose the unknown 
field measurements are restricted to only two unobserved lo- 
cations u and V residing in a highly uncertain area and they 
are highly correlated due to spatial proximity. The behavior 
of the ENT(7r) metric can be understood upon applying the 
chain rule for entropy (i.e., ENT(7r) = IS^Zu, Zv\ZxQ.tj_-i] — 
M[Zu\ZxQ.^^^] -\- W[Z^\Zx„.^^^,Zu]); the latter uncertainty 
term (i.e., posterior entropy of Zy) is significantly reduced or 
"discounted" due to the high spatial correlation between Zu 
and Z^. Hence, the mapping uncertainty of these two un- 
observed locations is not overestimated. A practical advan- 
tage of this metric is that it does not overcommit sensing re- 
sources; in the simple illustration above, a single observation 
at either location u or v suffices to learn both field measure- 
ments well. On the other hand, the ERR(7r) metric considers 
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(c) Field 3: £1 = 40.45 m, I2 = 5.00 m. 
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(d) Field 4: £1 = 40.45 m, I2 = 16.00 m. 

Figure 2: Temperature fields (measured in°C) 
with varying horizontal length-scale £\ and vertical 
length-scale I2. 

each location to be of high uncertainty due to the indepen- 
dence assumption; (b) In contrast to the ENT(7r) metric, the 
ERR(7r) metric can use ground truth measurements to eval- 
uate if the field is being mapped accurately. Let ENTD(7r) 

= ENT(w)-ENT(7r) and ERRD(7r) = ERR(5f)-ERR(7r). 
Decreasing ENTD(7r) improves the ENT(7r) performance of 
TT relative to that of tt. Small |ENTD(7r)| implies that tt 
achieves ENT(7f) performance comparable to that of tt. ERRD(7r) 
can be interpreted likewise. Additionally, we will consider 
the time taken to derive each policy as the third metric. 

5.2 Temperature Field Data 

We will first investigate how varying spatial correlations 
(i.e., varying length-scales) of the temperature field affect 
the ENT(7r) and ERR(7r) performance of evaluated policies. 
The temperature field is discretized into a 5 x 30 grid of sam- 
pling locations as shown in Figs, [l] and [2]i. The horizontal 
and/or vertical length-scales of the original field (i.e., field 4 
in Fig. |2|i) are reduced to produce modified fields 1, 2, and 

3 (respectively, Figs. [2^, [2|3, and[2[i); we fix these reduced 
length-scales while learning the remaining hyperparameters 
(i.e., signal and noise variances) through MLE. 

Table [l] shows the resufis of mean ENT(7r) and ERR(7r) 
performance of tested policies (i.e., averaged over all possi- 
ble starting robot locations) with varying length-scales and 
number of robots. The ENT(7r) and ERR(7r) for aU poli- 
cies generally decrease with increasing length-scales (except 
ERR(7r) for 1 robot from field 2 to 4) due to increasing 
spatial correlation between measurements, thus resulting in 
lower mapping uncertainty. 

For the case of 1 robot, the observations are as follows: 
(a) When £2 is kept constant (i.e., at 5 m or 16 m), reduc- 
ing £1 from 40.45 m to 5 m (i.e., from field 3 to 1 or field 

4 to 2) decreases ENTD(7r'^), ERRD(7r°), ENTD(7r*^), and 
ERRD(7r^^): when the horizontal correlation becomes small, 
it can no longer be exploited by the non-Markovian poli- 
cies tt'"' and ■k''' ; (b) For field 3 with large £\ and small £2, 



Table 1: Comparison of ENT(7r) (left) and ERR(7r) 
(xlO"'') (right) performance for temperature fields 
that are discretized into 5 x 30 grids (Fig. [2|). 
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Table 2: Comparison of ENT(7r) (left) and ERR(7r) 
(xlO~^) (right) performance for temperature field 
that is discretized into 13 x 75 grid. 
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ENTD(7r'') and ENTD(7r*^) are large as the Markov prop- 
erty of TT prevents it from exploiting the large horizontal 
correlation; (c) When £i is kept constant (i.e., at 5 m or 
40.45 m), reducing £2 from 16 m to 5 m (i.e., from field 2 
to 1 or field 4 to 3) increases ERRD(7r'^) and ERRD(7r*^): 
when vertical correlation becomes small, it can no longer be 
exploited by n, thus incurring larger ERR(7r). 

For the case of 2 robots, the observations are as follows: 

(a) |ENTD(7r'^)| and |ENTD(7r*^)| are small for aU fields 
except for field 2 where tv significantly outperforms tt*^. In 
particular, when £2 is kept constant (i.e., at 5 m or 16 m), 
reducing £1 from 40.45 m to 5 m (i.e., from field 3 to 1 or field 
4 to 2) decreases ENTD(7r''), ENTD(7r*^), and ERRD(7r^): 
this is explained in the first observation of 1-robot case; 

(b) For field 3 with large £1 and small £2, ERRD(7r'^) and 
ERRD(7r'*^) are large: this is explained in the second and 
third observations of 1-robot case; (c) When £1 is kept con- 
stant (i.e., at 5 m or 40.45 m), reducing £2 from 16 m to 5 m 
(i.e., from field 2 to 1 or field 4 to 3) increases ERRD(7r'^): 
this is explained in the third observation of 1-robot case. 
This also holds for ERRD(7r") when £1 is large. 

For the case of 3 robots, it can be observed that tt can 
achieve ENT(7f) and ERR(7r) performance comparable to 
(if not, better than) that of tv'^ and tt^^ for all fields. 

To summarize the above observations on spatial correla- 
tion conditions favoring n over and tt**^, tt can achieve 
ENT(7f) performance comparable to (if not, better than) 
that of tt'' and vr*^ for all fields with any number of robots 
except for field 3 (i.e., of large £1 and small £2) with 1 robot 
as explained previously. Policy tt can achieve comparable 
ERR(7f) performance for field 2 (i.e., of small £1 and large 
£2) with 1 robot because tt is capable of exploiting the large 
vertical correlation, and the small horizontal correlation can- 
not be exploited by tv'^ and tv'^' . Policy tt can also achieve 
comparable ERR(7f) performance for all fields with 2 and 3 
robots except for field 3 (i.e., of large £1 and small £2) with 2 
robots. These observations reveal that (a) small horizontal 
and large vertical correlations are favorable to tt; (b) though 
large horizontal and small vertical correlations are not favor- 
able to TV, this problem can be mitigated by increasing the 
number of robots. For more detailed analysis (e.g., visual- 
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Figure 3: Plankton density (chl-a) field (measured in 
mg m~^) spatially distributed over a 314 m x 1765 m 
transect that is discretized into a 8 x 45 grid with 

£i = 27.53 m and £2 = 134.64 m. 

ization of planned observation paths and their corresponding 
error maps), the interested reader is referred to [s]. 

We will now examine how the increase in resolution to 13 x 
75 grid affects the ENT(7r) and ERR(7r) performance of eval- 
uated policies; the resulting grid discretization width and 
planning horizon are about 0.4 x smaller and 2.5 x longer, 
respectively. Table[2]shows the results of mean ENT(7r) and 
ERR(7r) performance of tested policies with varying num- 
ber of robots, from which we can derive observations similar 
to that for temperature field 4 discretized into 5 x 30 grid: 
TT can achieve ENT(7r) and ERR(7r) performance compara- 
ble to (if not, better than) that of tv'^ and tt*^ except for 
ERR(7f) performance with 1 robot. So, increasing the grid 
resolution does not seem to noticeably degrade the active 
sampling performance of tt relative to that of tt'^ and tv'^ . 

5.3 Plankton Density Field Data 

Fig. [3] illustrates the plankton density field that is dis- 
cretized into a 8 X 45 grid. Table [S] shows the results of 
mean ENT(7r) and ERR(7r) performance of tested policies 
with varying number of robots. The observations are as 
follows: TT can achieve the same ENT(7f) and ERR(7f) per- 
formance as that of tt'' and superior ENT(7r) performance 
over that of tt*^ because small horizontal and large vertical 
correlations favor tt as explained in Section [5.2[ By increas- 
ing the number of robots (i.e., k > 2),n can achieve ERR(7r) 
performance comparable to (if not, better than) that of tt*^. 

Table |4] shows the results of mean ENT(7r) and ERR(7r) 
performance of tested policies after increasing the resolu- 
tion to 16 X 89 grid; the resulting grid discretization width 
and planning horizon are about 0.5 x smaller and 2x longer, 
respectively. Similar observations can be obtained: tt can 
achieve ENT(7r) performance comparable to that of tt^ and 
superior ENT(7r) performance over that of tt*^. By deploy- 
ing more than 1 robot, tt can achieve ERR(7f) performance 
comparable to (if not, better than) that of tt'^ and tt*^. 
Again, we can observe that increasing the grid resolution 
does not seem to noticeably degrade the active sampling 
performance of tt relative to that of tt'^ and tt*^. 

5.4 Incurred Policy Time 

Fig. [4] shows the time taken to derive the tested policies 
for sampling the temperature and plankton density fields 
with varying number of robots and grid resolutions. It can 
be observed that the time taken to derive n is shorter than 
that needed to derive tt'^ and tt*^ by more than 1 and 4 
orders of magnitude, respectively. It is important to point 
out that Fig. kl reports the average time taken to derive tt'^ 
and tt'^' over all possible starting robot locations. So, if the 
starting robot locations are unknown, the incurred time to 
derive tt'^ and tt*' have to be increased by '"Cfe-fold. In con- 
trast, TT caters to all possible starting robot locations. So, 
the incurred time to derive tt is unaffected. These observa- 
tions show a considerable computational gain of tt over tt'^ 
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Figure 4: Graph of time taken to derive policy vs. number k of robots for temperature field 4 discretized 
into (a) 5 X 30 and (b) 13 x 75 grids and plankton density field discretized into (c) 8 x 45 and (d) 16 x 89 grids. 



Table 3: Comparison of ENT(7r) (left) and ERR(7r) 
(xlO^'^) (right) performance for plankton density 
field that is discretized into 8 x 45 grid. 



Table 4: Comparison of ENT(7r) (left) and ERR(7r) 
(xlO""^) (right) performance for plankton density 
field that is discretized into 16 x 89 grid. 
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and TT*^, which supports our time complexity analysis and 
comparison (Section|4|. So, our Markov-based path planner 
is more time-efficient for in situ, real-time, high-resolution 
active sampling. 

6. CONCLUSION 

This paper describes an efficient Markov-based information- 
theoretic path planner for active sampling of GP-based en- 
vironmental fields. We have provided theoretical guarantees 
on the active sampling performance of our Markov-based 
policy TV for the transect sampling task, from which ideal 
environmental field conditions (i.e., small horizontal spatial 
correlation and noisy, less intense fields) and sampling task 
settings (i.e., large grid discretization width and short plan- 
ning horizon) can be established to limit its performance 
degradation. Empirically, we have shown that vr can gen- 
erally achieve active sampling performance comparable to 
that of the widely-used non-Markovian greedy policies -k^ 
and TT*^ under less favorable realistic field conditions (i.e., 
low noise-to-signal ratio) and task settings (i.e., small grid 
discretization width and long planning horizon) while en- 
joying huge computational gain over them. In particular, 
we have empirically observed that (a) small horizontal and 
large vertical correlations strongly favor tt; (b) though large 
horizontal and small vertical correlations do not favor n, 
this problem can be mitigated by increasing the number of 
robots. In fact, deploying a large robot team often produces 
superior active sampling performance of vf over tt*^ in our 
experiments, not forgetting the computational gain of > 4 
orders of magnitude. Our Markov-based planner can be used 
to efficiently achieve more general exploration tasks (e.g., 
boundary tracking and those in [6][7]), but the guarantees 
provided here may not apply. For our future work, we will 
"relax" the Markov assumption by utilizing a longer (but 
not entire) history of observations in path planning. This 
can potentially improve the active sampling performance in 



fields of moderate to large horizontal correlation but does 
not incur as much time as that of non-Markovian policies. 
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APPENDIX 
A. PROOFS 

A.l Proof Sketch of Theorem [1] 

For each vector Xi of current robot locations, the time 
needed to evaluate the posterior entropy ]3.[Zr(xi,ai)\Zxi] 
(i.e., using Cholesky factorization) over all possible actions 
at e A{xi) is 1^1 X 0{k*) = 0{\A\k*). Doing this over all 
possible vectors of current robot locations in each column 
thus incurs |^| x Od^lfc*) = 0{\A\^k''') time since the vector 
space of current robot locations in each column is of the same 
size as that of the joint action space |^|. We do not have 
to compute these posterior entropies again for every column 
because the entropies evaluated for any one column repli- 
cate across different columns. This computational saving is 
due to the Markov assumption and the problem structure of 
the transect sampling task. Propagating the optimal values 
from stages f to takes Od^l^t) time. Hence, solving the 
Markov-based path planning problem or deriving the 
Markov-based policy tt takes 0(1.4^* -|- fc*)) time for 
the transect sampling task. 



A.2 Proof of Lemma m 



Let E., 



— C + E where C is defined to be a 



matrix with diagonal components a'^^ — a'^ + for k — 
0, . . . ,i — 1 and off-diagonal components 0, and E is defined 
to be a matrix with diagonal components —{cx^xi)^ /cr^^ = 
— {o'x^xi)^/{iys + CTn) for fc = 0, — 1 and the same 
oflF-diagonal components as T,xa,i_^xa,i_i\x, (i.e., (Txjx,,\x, = 



^xiXk/i^x, for j,k = 0, 



|C" 



-O-n) ^^||2 



1, j/fc). Then, 



(15) 



The last equality follows from + being the smallest 
eigenvalue of C. So, l/(crf -I- a^) is the largest eigenvalue of 
, which is equal to ||C^^||2. 
Note that the minimum distance between any pair of lo- 
cation components of xo:i~i cannot be less than uji. So, it 
can be observed that any component of E cannot have an 
absolute value more than a^^. Therefore, 



\E\\2<iai^ 



(16) 



which follows from a property of the matrix 2-norm that 
||£||2 cannot be more than the largest absolute component 
of E multiplied by i pi. 

Note that the minimum distance between locations Xi and 
Xi+i as well as between location Xi and any location compo- 
nent of XQ:i-i cannot be less than ui. So, it can be observed 
that any component of ^x^^^^ixg-i^ilxi cannot have an abso- 
lute value more than a^S,^. Therefore, 



(17) 



for fc = 0, . . 



Now, 
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< l|S.,+i.o_i|.JIi \\{c + E)-'-c-'\\2 



El 

fc=0 



2 



\c- 

1 



2 



V 2x2,4 



lic-i 

|g-^l|2 ||ig||2 
\E\\2 



\E\\2 

- \\E\\. 



(18) 



|C-l|l2 



The first inequality is due to Cauchy-Schwarz inequality and 
submultiplicativity of the matrix norm [l2] . The second in- 
equality follows from an important result in the perturba- 
tio n th eory of matrix inverses (in particular. Theorem III. 2. 5 
in 12 ). It requires the assumption of ||C~^ ii'||2 < 1. 
This assumption can be satisfied by ||C"^|l2 ||S||2 < 1 
because HC"^ £112 < 11C-^||2 ||£'||2. By (|l5| and (fl6|. 



\\C "'^||2||-B|l2<l translates to ^ < p/i. The last equality 
is due to (|17l 
From flS 



< T 

_ ^x^ I 1 a;n.,- 



^-1^ I 2^^2f4 ll*^ II2 H-E|j2 

^XQ.i_iXi^l\xi -t- l(Os ) ? 1 



IB 2 



1+ 



l-Ell 



\\c- 



\E\\2 
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„2c4 



The second inequality is due to 



(19) 



which follows from Cauchy-Schwarz inequality and (171. The 
third inequality follows from ( |15[ ) and \1G\ . 

We will need the following property of posterior variance 
that is similar to ([3|: 

2 2 v-i — 1 

'^Xi^-l\xQ,i — ^Xi^l\xi~^Xi^lX0,i_l\xi^XQ.i_lX0.i_l\xi^XQ,i_lXi + l\xi 

(20) 

where ^xi^ixo-i-i\xi is a posterior covariance vector with 
components ax^_^,ixk\x, for fc = 0, 1, ^xo..i-ixi+i\xi 

is the transpose of 'Exi^ixo,i^-,\xi, and T.xg.^_-,xo,,_-,\xi is a 
posterior covariance matrix with components ax^x^lxi for 
j, fc = 0, ...,i-l. 
By fTgl) and ((20|, 



2 _ 2 

Xi^l\xi Xi^i\xQ.i 

'^^i + 1^0:i-l\^i'^xa,i_ixa.i_l\xi^^0:i-l^i + l\^i 



< 



„2c4 



A.3 Proof of Theorem U 

Proof by induction on i that V^^ (xo-.i) < Vi{xi) < (xo-.i) + 
EUA(s) fori = f,...,0. 



(21) 



Base case (i — t): By Lemma|3] 

< M[Za:,+,\Z^^,J + A{t) for any xt+i 
max e[Z,,+JZ,„J< max e[Z,,^JZ,,] 

< max e[Z,,+,|Z,„J + A(t) 

at GA{xt ) 

^ K"'(a;o:t) < Vt{xt) < Vr*(a;o:t) + A{t) . 
Hence, the base case is true. 
Inductive case: Suppose that 

t 

Vr+\{xo..^+i)<V,+ii=^.+i)<Vr+\{xo:^+i)+ A{s) (22) 

is true. We have to prove that V^'^ (xo-.i) < Vi(xi) < (a;o:i) + 
Y!s=i A{s) is true. 

We will first show that Vi{xi) < V^' (xo-.i) + E!=, A{s). 
By Lemma [3] 

e[Z,.+i|Z,J < U[Z^,^,\Z:,„_J + A{i) for any Xi+i 

^ w[z^,+,\z^^] + v,+i{x,+i) < n[z^,+,\z^^j + 

V,l\ixo:^+l) + E!=, A(s) by (El) for any x,+i 
=> max ]Sl[Z^ \Za:^] + Vi+i{xi+i) 

t 

< max e[Z,^+JZ,o.J + 1/4*1 (a;o:»+i) + VA(s) 

ai^A{xi) ^ — 

t 

^ V^ix,)<Vf{xo■.^)+Y,A{s). 

s — i 

We will now prove that 1// (3:^0:0 < Vi{xi). By Lemmajsj 
e[Z,,_^JZ,„^J < e[Z,^^JZ,J for any 
^ ]Sl[Z^^^,\Z^„.J + Vr+\{xo-.+i) 

< M[Z,^^,\Z^^] + F,+i(a;,+i) by (HH) for any 
=> max EI[^j; |Z^„,J + l/i+i(so:i+i) 

< maXf,.g^(^^) EI[Zj;^^i |Zj;J + 1/i+i(a;i+i) 

^ {X0:r) < V,{X,) . 

Hence, the inductive case is true. 

A.4 Proof of Theorem g] 

The following lemma is needed for this proof: 



The proof of the above lemma is provided in Appendix |A.6[ 

Proof by induction on i that [xq-a) < Vj^ {xo:i)+'}2s=i A{s) 
for i = t, . . . ,0. 

Base case {i — t): 

Vr\xo:t) < Mxt) < y/(a;o:t) + A(t) . 

The first inequality is due to Theorem |4] The second in- 
equality follows from Lemma[7] Hence, the base case is true. 



Inductive case: Suppose that 

t 

Vr+\{xo:+i) < V^Axo:+i) + Yl ^(") (23) 

s=i:+l 

is true. We have to prove that 1// (xo-.i) < Vj^{xo:i) + 
E!=i A(s) is true. 

V^'ixo:,) < V^{X^) 

= MZT{xi,i,{xi))\Zx,] + Vi + l{T{Xi,7V^{x^))) 

< e[Z,{,^,s^(,,))jZ,oJ + A{i) + V^ + l{T{x„n,{x,))) 

<e[2',(,^,S.(:..))l^-0:J+A(l) + 'K^l( {X„:„r{x^,n,{x^))) ) + 

EU+iA(.) 

t 

^Vfixo-4+J2 Ais) ■ 

s — i 

The first inequality is due to Theorem [4] The first equal- 
ity follows from The second inequality follows from 
LemmajS] The third inequality is due to Lemma[7] The last 
equality follows from ([5|. Hence, the inductive case is true. 

A.5 Proof Sketch of Lemma |6] 

Define a;^'"' to be the m-th component of vector Xi of robot 
locations for m — 1, . . . , fc. Let denote a vector com- 

prising the first m components of Xi (i.e., concatenation of 

= M[Zx^^, |Z,J - M[Zx^^, \Zx„J 

m = l V ^ ^ ^ ^ ' 



lEhog 



m — 1 \ 

\ i-i-. 

2^ log 1 



m — 1 



J" 



> . 

(24) 

The second equality follows from the chain rule for entropy. 

Similar to Lemma [2] the following result bounds the vari- 
ance reduction term 

2 2 



Lemma 7. V,{xi) <V^ {xQ.,i)+Y:,=^A{s) fori = . . . ,t. in (|24| 



Lemma 8. If ^ < min(-^, -^) and (fj^lj is satisfied, 
ik 4:k 



< o- 



The proof of the above result is largely similar to that of 
Lemma [2] (Appendix |A.2| ), and is therefore omitted here. 

The bounds on I[Zxi^i; Zxo.i-i\Zxi] follow immediately 
from ( |24[ ), Lemma [s] and the following lower bound on 



The equality is due to ([3|. The first inequality is due to 
Cauchy-Schwarz inequality, submultiplicativity of the ma- 
, and a result in the perturbation theory of 



12 



trix norm 

matrix inverses (in particular, Theorem III. 2. 5 in [12]). The 
second inequality follows from the given satisfied condition 

A.6 Proof of Lemma |7] 

Proof by induction on i that Vi{xi) < 'K'^(a;o:i)+X]s=i '-^{^) 
for i = . . . , 0. 

Base case {i — t): 

Vt{xt) = max M[Zr{xt,at)\Zxt] 

< H[Z^(3.j_5:j(^j))|Za:(,^J + A(f) 
= Vr{xO:t)+A{t) . 



The first equality follows from (111. The inequality follows 
from Lemma [3] The last equality is due to ||5|. So, the base 
case is true. 



Inductive case: Suppose that 

t 

V,+iixi+i)<V,lAxo:+i)+ J2 ^(^) (25) 

s = i + l 

is true. We have to prove that Vi{xi) < {xo:i)+'^l^i A{s) 
is true. 

By Lemma |3j 

H[Z,^+J^,J < n[Zx,+,\Z^„.^] + A{i) for any x^+i 

=> U[Z^.^^,\Z^,] + V,+iix^+i) < HZx,^,\Z:,„.J + 

V^+l(xo:^+l) + EL» Ms) by ^ for any Xi+i 

=^ ^[Zr{xi,lTi{x,))\Zxi] + Vi + l{T{Xi,lTi{Xi))) 

<^[ZT(xi,^,^xi))\Zx^y_i] + V^+l{ {x^y.i,T{xi,^^:^{xi))) ) + 

Y!'s^iA{s)hy Xi+x ir- T{xi,lii{xi)) 
t 

V,{xi) < Vfixo-4 + Y,Ms) by dm and ©. 

s—i 

Hence, the inductive case is true. 



